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Summary 

In  rehearsing  specific  missions,  soldiers  frequently  must 
learn  about  spaces  to  which  they  have  no  direct  access. 
Virtual  Environments  (VE)  representing  those  spaces 
can  be  constructed  and  used  to  rehearse  the  missions,  but 
how  do  we  ensure  their  effectiveness?  The  US  Army 
Research  Institute  was  among  the  first  to  demonstrate 
that  spatial  knowledge  acquired  in  a virtual  model  of  a 
building  transferred  to  the  real  world.  While  route 
knowledge  was  readily  acquired  in  a VE,  configuration 
knowledge  (distance  and  direction  to  locations  not  in  the 
line-of- sight)  was  not.  Spatial  learning  in  the  VE  was 
hampered  not  only  by  disorientation  resulting  from  a 
narrow  FOV  and  multiple  collisions  with  walls,  but  also 
by  participants’  inability  to  accurately  estimate  distances 
in  VEs.  Poor  distance  estimation  in  VE  was  linked  to  the 
reduced  VE  FOV  and  to  verbal  report  procedures  for 
making  the  estimates.  Some  improvement  in  distance 
estimates  was  obtained  by  adding  auditory  compensatory 
cues  for  distance  and  by  using  the  non-visually 
locomotion  technique  for  obtaining  distance  estimates. 
Armed  with  knowledge  that  some  VE  characteristics 
adversely  affect  distance  estimation  and  configuration 
learning,  we  conducted  research  to  determine  if  unique 
capabilities  of  VEs  could  compensate  for  those 
characteristics.  We  developed  three  VE  navigation 
training  aids:  local  and  global  orientation  cues,  aerial 
views,  and  division  of  the  VE  into  distinctive  themed 
quadrants.  The  aids  were  not  provided  when  testing 
configuration  knowledge.  Training  included  a guided 
tour,  free  exploration  of  the  VE  and  searching  for 
designated  rooms.  Configuration  knowledge  tests 
included  a shortest  route  test,  a pointing  task,  and  a map 
construction  task.  An  aerial  view  was  the  most  effective 
navigation  aid,  though  its  effectiveness  depended  on  how 
it  was  used.  Those  participants  who  used  aerial  views  to 
organize  the  VE  and  learn  its  layout  during  free 
exploration  performed  quite  well,  while  participants  who 
used  it  as  a crutch  to  locate  a particular  destination 
performed  worse  than  those  without  an  aerial  view.  To 
ensure  that  VEs  train  effectively,  we  must  recognize 
VEs’  deficiencies,  compensate  for  deficiencies  whenever 
possible,  and  exploit  VEs’  unique  training  capabilities. 

Introduction 

The  U.S.  Army  has  invested  heavily  in  the  use  of  virtual 
environments  (VE)  to  train  combat  forces,  to  evaluate 


new  systems  and  operational  concepts,  and  to  rehearse 
specific  missions.  While  the  Army  has  focused  mainly 
on  simulations  for  mounted  combat,  there  is  also  a need 
to  train  infantry  and  other  dismounted  soldiers.  In 
training  dismounted  soldiers  there  are  occasions  (e.g., 
rehearsing  a hostage  rescue  mission)  in  which  the 
soldiers  must  learn  about  strategically  important  spaces 
to  which  they  have  no  immediate  access.  Virtual 
environments  can  be  constructed  as  a substitute  for  these 
spaces,  but  how  effective  are  they?  This  paper  describes 
a series  of  experiments  that  investigated  the  limitations 
of  using  VE  for  training  spatial  knowledge  and  how  VE 
might  be  improved  to  meet  Army  human  performance 
goals. 

Although  VE  technologies  such  as  helmet-mounted 
visual  displays,  head  trackers,  3-D  sound  systems,  haptic 
devices,  and  powerful  graphics  image  generators  have 
the  potential  to  immerse  dismounted  soldiers  directly  in 
virtual  training  environments,  their  capability  to  provide 
effective  training  has  yet  to  be  ascertained.  The  effective 
use  of  VE  for  training  requires  more  than  just  VE 
hardware  and  software.  It  also  requires  a body  of 
knowledge  that  identifies  the  characteristics  of  VE 
systems  that  are  required  to  provide  effective  training 
and  the  training  strategies  and  features  that  are  most 
appropriate  for  use  with  VE.  In  order  to  develop  this 
body  of  knowledge,  the  U.S.  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences  (ARI)  Simulator 
Systems  Research  Unit,  initiated  a program  of 
experimentation  to  investigate  the  use  of  VE  technology 
to  train  dismounted  soldiers  in  1992. 

Experiment  1:  Transfer  of  Spatial  Knowledge 

We  were  among  the  first  to  conduct  research 
demonstrating  transfer  of  spatial  knowledge  from  VE  to 
a real  world  environment  (Witmer,  Bailey,  Knerr,  & 
Parsons,  1996).  For  this  research,  a detailed  model  of  a 
large  office  building  was  constructed  using  Multigen  and 
World  Tool  Kit.  The  model  was  rendered  using  a Silicon 
Graphics  Crimson  Reality  Engine  and  displayed  via  a 
Fake  Space  Lab  Boom.  The  Boom  consists  of  a high- 
resolution  binocular  display  on  the  end  of  an  arm  that 
allowed  six  degree-of- freedom  movement  and  thumb 
buttons  for  controlling  forward  and  backward  motion. 


1 For  correspondence  with  author:  Bob_Witmer@stricom.army.mil. 
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The  participants  were  sixty  college  students  who  had  no 
previous  exposure  to  the  building.  Participants  first 
studied  route  directions  and  photographs  of  landmarks, 
either  with  or  without  a map,  then  were  assigned  to  one 
of  three  rehearsal  groups.  These  were  (1)  a VE  group 
that  rehearsed  in  the  building  model,  (2)  a building 
rehearsal  group  that  rehearsed  in  the  actual  building,  and 
(3)  a symbolic  rehearsal  group  that  relied  on  verbal 
rehearsal  of  the  route  directions.  Participants  were  then 
tested  in  the  real  world  building  for  transfer  of  route 
training. 

Differences  in  training  transfer  were  evaluated  using  a 
MANOVA  with  rehearsal  mode,  map,  and  gender  as  the 
independent  measures.  Only  the  main  effect  for  rehearsal 
mode  was  significant  (p<.001).  A follow-up  ANOVA 
indicated  that  this  effect  was  significant  for  each  of  the 
dependent  measures:  route  traversal  time  (p<.001); 
number  of  wrong  turns  (p<.001);  and  total  distance 
traveled  (p<.05).  Participants  trained  in  the  building 
made  fewer  wrong  turns  (t=3.25,  p<.005)  and  traveled 
less  distance  (t=2.9,  p<.01)  than  did  participants  who 
were  trained  in  the  virtual  environment  (VE).  VE 
participants,  in  turn,  made  fewer  wrong  turns  (t=-4.77, 
p<.001)  and  took  less  time  to  traverse  the  route  (t=-5.82, 
p<.001)  than  those  who  were  trained  symbolically. 

In  practicing  the  route,  participants  were  expected  to 
acquire  some  knowledge  about  the  overall  layout  of 
building  (i.e.,  the  building  configuration).  Configuration 
knowledge  was  measured  using  the  projective 
convergence  technique  (Siegel,  1981;  Kirasic,  Allen,  Sc 
Siegel,  1984)  and  by  measuring  the  capability  of  subjects 
to  exit  the  building  quickly  using  an  unrehearsed  route. 
The  projective  convergence  technique  requires 
participants  to  estimate  the  distance  and  direction  to 
target  locations  not  in  the  line  of  sight,  and  uses  these 
estimates  to  determine  the  participant’s  perceived  target 
location.  The  participants  either  draw  lines  to  indicate 
the  distance  and  direction  to  targets  (in  a non-immersive 
mode)  or  point  to  indicate  bearing  and  verbally  report 
their  distance  judgments  in  standard  or  metric  units  (in 
an  immersive  mode).  Errors  in  estimated  bearing  and 
distance  using  this  method  may  either  be  due  to  poor 
distance  estimation  skills  or  disorientation  and  a lack  of 
knowledge  regarding  the  designated  target  location. 
Hence  it  is  not  a pure  distance  estimation  measure. 
MANOVA  was  used  to  assess  differences  in  the  amount 
of  configuration  knowledge.  Surprisingly,  there  were  no 
significant  differences  among  the  various  rehearsal 
conditions  (p=.135)  and  no  significant  differences  as  a 
function  of  map  use  (p~.688).  Only  the  effect  of  gender 
was  significant,  with  males  performing  better  than 
females  (p=.015).  No  significant  interactions  were 
found. 

The  results  suggest  that  individuals  can  learn  how  to 
navigate  a real  world  route  by  training  in  a virtual 
environment.  While  the  VE  used  in  this  experiment  was 
not  as  effective  in  training  subjects  as  the  actual 


building,  it  was  much  better  than  verbally  rehearsing 
route  directions,  even  for  subjects  who  had  previously 
studied  a map.  The  effectiveness  of  the  VE  for  acquiring 
route  knowledge  was  probably  limited  by  the  display 
reduced  field  of  view  and  by  disorientation  after 
collisions  with  virtual  objects.  These  factors  along  with 
an  unnatural  interface  that  controlled  movement  through 
the  VE.  These  factors  along  with  participants’  inability 
to  judge  distance  in  VEs  may  also  have  adversely 
affected  the  acquisition  of  configuration  knowledge. 

Experiments  2-5:  Judging  distance  in  Ves 

To  better  understand  why  participants  were  unable  to 
accurately  judge  distance  in  the  VE,  ARI  investigators 
conducted  a series  basic  research  experiments  in  the  area 
(Kline  Sc  Witmer,  1996;  Witmer  Sc  Kline,  1998;  Witmer 
Sc  Sadowksi  ,1998).  Kline  Sc  Witmer  (1996)  and  Witmer 
and  Kline  (1998)  used  magnitude  estimation  to  measure 
participants’  ability  to  estimate  distances  in  a VE.  The 
task  was  performed  in  a virtual  office  corridor  with 
various  floor  and  wall  patterns  and  textures.  Participants 
first  estimated  the  distance  to  a standard  stimulus  (e.g.,  a 
cylinder  at  100  feet)2.  They  received  no  feedback 
regarding  the  accuracy  of  their  distance  estimates  to  the 
standard  stimulus,  but  were  told  that  all  subsequent 
estimates  should  be  made  relative  to  that  standard. 
Actual  distances  varied  from  1 to  12  feet  in  one 
experiment  (Kline  Sc  Witmer,  1996),  from  10  to  110  feet 
in  another,  and  from  10  to  280  feet  in  a third  (Witmer  Sc 
Kline,  1998).  The  basic  measure  for  all  of  these 
experiments,  with  the  exception  of  Witmer  and 
Sadowski  (1998),  was  the  reported  target  distance  in  feet 
or  meters.  The  amount  of  error  in  these  estimates  was 
calculated  as  the  difference  between  the  estimated  and 
true  distance  divided  by  the  true  distance.  This  error 
measurement  is  called  relative  error  because  it  is  the 
amount  of  error  relative  to  the  true  target  distance. 

Kline  Sc  Witmer  (1996)  investigated  how  accurately 
stationary  observers  could  estimate  distance  to  a wall  in 
a VE  as  FOV,  texture,  and  pattern  were  varied.  The 
observer’s  view  was  fixed  (i.e.,  no  head  tracking}.  The 
distances  being  judged  were  between  1 and  12  feet.  The 
results  indicated  that  a wider  FOV  (14011  x 90V 
degrees)  produced  more  accurate  estimates  than  a 
narrow  FOV  (60H  * 38Vdegrees),  F(2,23)=5.85,  £<.01. 
Distances  were  typically  underestimated  with  the  wide 
FOV  and  overestimated  using  the  narrow  FOV.  For 
example,  a target  placed  5 feet  from  the  observer  was 
judged  to  be  at  2.68  feet  with  the  wide  FOV  and  8.73 
feet  with  the  narrow  FOV.  Significant  two-way 
interactions  of  distance  with  texture,  F(44,1054)~2.53, 
2<.00I,  pattern,  F(22,3)=I4.1,  p<.05  and  FOV, 

F(44,1054)=2.5,  £<.001.  indicated  that  these  variables 
affected  depth  perception  only  at  the  shorter  distances. 


2 Note:  All  distances  are  given  in  feet.  Multiply  by  .3048  to 
convert  to  meters. 
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In  another  experiment,  Witmer  & Kline  (1998) 
investigated  the  effects  of  floor  texture  and  pattern  on 
distance  judgements  to  a cylinder  for  distances  up  to  1 10 
feet.  The  observers  were  stationary  and  had  a fixed  view 
of  the  target  scene  (i.e.,  no  head  tracking).  Participants 
grossly  underestimated  the  target  distance;  the  estimates 
averaged  about  50%  of  the  true  target  distance.  This 
compares  to  estimates  of  approximately  75%  of  the  true 
distance  in  a comparable  real  world  environment. 
Cylinder  size,  F(l,22)=38.67,  £<.001,  distance, 

F(5,18)=5.87,  £<.01,  and  the  interaction  of  cylinder  size 
and  distance,  F(5,18)=3.97,  £<.05,  significantly  affected 
the  magnitude  of  the  VE  estimates.  The  estimates  were 
more  accurate  for  the  small  cylinder  than  for  the  large 
cylinder.  For  example,  a target  placed  50  feet  from  the 
observer  was  judged  to  be  22.57  feet  for  the  small 
cylinder  and  18.91  feet  for  the  large  cylinder.  Floor 
texture  did  not  significantly  affect  either  the  distance 
estimates  or  the  magnitude  of  the  relative  errors. 

Witmer  & Kline  (1998)  also  reported  the  results  of  an 
experiment  in  which  moving  observers  judged  distance 
traversed  for  distances  up  to  280  feet.  Half  of  the 
participants  received  compensatory  cues  (an  audible  tone 
every  10  feet)  to  help  them  calibrate  their  distance 
judgements  to  the  true  target  distances.  Although  these 
cues  were  provided  on  only  half  of  the  trials,  they 
improved  performance  to  levels  approaching  perfect 
performance,  F(l,60)=l  1.49,  £<.001.  The  judgments 
averaged  96%  of  the  true  target  distance  when 
compensatory  cues  were  present  but  only  67%  of  the 
target  distance  when  compensatory  cues  were  absent. 
The  mode  of  locomotion  used  in  moving  through  the  VE 
(treadmill,  joystick,  or  teleport)  did  not  significantly 
influence  the  accuracy  of  the  distance  estimates,  but 
speed  of  movement  had  a significant  impact  on 
estimation  accuracy,  F(l,60)=36.15,  £<.001.  Distance 
judgments  were  more  accurate  at  the  slow  speed  than  at 
the  fast  speed.  For  example,  a distance  of  280  feet  was 
judged  to  be  267  feet  on  the  average  when  moving  at  the 
slow  speed  and  241  feet  when  moving  at  the  fast  speed. 
Accuracy  of  the  distance  estimates  generally  decreased 
as  distance  to  the  target  increased,  F(7,54)=482.53, 

p<.001. 

The  extremely  poor  VE  distance  estimates  made  by  a 
stationary  observer  and  the  lack  of  substantial 
improvement  in  the  accuracy  of  the  estimates  when 
observer  movement  was  added  (Witmer  & Kline,  1998) 
suggests  that  either  verbal  estimates  of  distance  are  not 
very  accurate  or  that  VEs  degrade  distance  estimation  to 
a large  degree.  The  ability  of  participants  to  accurately 
report  distances  in  feet  or  meters  varies  widely  among 
participants,  and  may  be  independent  of  their  perception 
of  target  distance.  These  individual  differences  may 
inflate  the  amount  of  error  observed  in  estimating  target 
distance.  To  determine  how  much  of  the  problem  is  due 
to  the  requirement  to  provide  verbal  estimates  of 
distance  and  how  much  is  due  to  VE  factors,  Witmer  & 
Sadowski  (1998)  used  non-visually  guided  locomotion 


(NVGL)  to  obtain  distance  judgements  in  VE  and  real 
world  environments.  Participants  viewed  a target  for  10 
seconds  from  a stationary  position,  forming  a mental 
image  of  the  target’s  location.  They  were  then 
blindfolded  and  asked  to  walk  to  the  target’s  location, 
keeping  the  target's  location  in  their  minds  as  they 
approached  it  and  stopping  when  they  thought  they  had 
reached  it.  They  were  asked  not  to  count  steps  or  time 
mentally.  The  distance  judgments  were  performed  both 
in  a real  world  officer  corridor  and  in  a virtual  office 
corridor  modeled  to  simulate  the  real  world  corridor.  The 
target,  a construction  cone,  was  clearly  visible  and 
distinct  from  the  background  at  all  distances.  Participants 
made  judgements  for  targets  placed  at  distances  between 
15  and  105  feet.  The  distance  judgements  averaged 
about  85%  of  the  true  target  distance  in  the  VE  and  92% 
of  the  true  target  distance  in  the  real  world  environment. 
The  differences  between  the  distance  judgements  in  the 
VE  and  in  the  real  world  were  significant,  however, 
F(l,20)=4.41,  £<.01.  The  magnitude  of  the  errors  in  the 
VE  was  nearly  twice  those  obtained  in  the  real  world. 

Implications  of  the  learning  transfer  and  distance 
estimation  experiments 

Our  initial  investigation  of  configuration  learning 
(Witmer  et  al.,  1996)  suggested  that  distance  estimates  in 
VE  were  poor.  Witmer  and  Kline  (1998)  confirmed  this, 
showing  that  distance  estimation  in  a VE  is  significantly 
less  accurate  than  in  the  real  world.  Kline  & Witmer 
(1996)  demonstrated  that  reducing  the  FOV  for  one  of 
the  devices  (BOOM2C)  could  affect  not  only  the  amount 
of  error  in  distance  estimates,  but  also  the  direction  of 
that  error  (underestimates  vs.  overestimates).  The 
hypothesized  that  narrow  FOV  produced  less  accurate 
estimates  by  reducing  or  eliminating  linear  perspective 
cues.  Witmer  & Kline  (1998)  found  that  manipulation  of 
textures  did  little  to  eliminate  the  observed  deficits  in 
performance.  Although  target  size  did  influence 
performance,  manipulation  of  the  size  of  unfamiliar 
objects  is  not  a practical  solution.  Taken  together,  these 
studies  suggest  that  VEs  distort  monocular  or 
stereoscopic  distance  cues,  negatively  impacting  the 
distance  judgements  in  those  VEs. 

We  had  anticipated  that  providing  the  cues  for  distance 
associated  with  movement  would  compensate  for  the 
distortion  of  other  distance  cues  in  VE,  resulting  in 
substantial  improvements  in  performance.  However, 
Witmer  & Kline  (1998)  found  that  neither  movement 
method  nor  edge  rate  markedly  changed  the  distance 
judgments.  These  results  indicate  that  proprioceptive 
cues  and  visual  How  cues  may  not  play  a major  role  in 
making  distance  judgements  in  a VE.  In  contrast, 
movement  speed  clearly  influenced  distance  judgments, 
suggesting  that  the  time  spent  covering  a distance 
changes  one's  perception  of  distance  traveled.  This 
research  also  suggested  that  distance  perception  in  VE 
could  be  recalibrated  cognitively  by  providing 
compensatory  cues  for  distance.  This  cognitive 
recalibration  may  or  may  not  extend  to  other  distances  or 
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to  other  environments,  however.  Witmer  & Kline  (1998) 
did  not  collect  data  that  would  answer  questions  about 
transfer  of  estimating  skill  to  other  distances  or 
environments. 

Using  NVGL  to  evaluate  the  accuracy  of  VE  distance 
estimates  altered  our  working  hypothesis  regarding  how 
much  VE  degrades  distance  estimates.  This  procedure 
yielded  more  accurate  VE  distance  estimates,  suggesting 
that  the  use  of  verbal  distance  estimates  is  partly 
responsible  for  the  poor  performance  observed  in  our 
research.  However  the  magnitude  of  the  errors  in  VE 
using  the  NVGL  procedure  was  still  twice  that  observed 
in  the  real  world,  establishing  beyond  any  reasonable 
doubt  that  VEs  are  distorting  perceptual  judgments  of 
distance. 

Factors  influencing  VE  distance  judgements 

What  factors  might  be  responsible  for  this  distortion?  In 
our  search  for  an  explanation  it  is  important  to  remember 
that  the  performance  decrements  were  found  across 
various  VEs  using  different  display  devices,  and  with 
varying  movement  conditions.  It  is  also  important  to 
keep  in  mind  the  distances  investigated  in  each 
experiment,  because  the  effective  range  of  various 
distance  cues  vary  with  the  distance  being  judged. 

To  understand  why  VE  distorts  distance  perception  at  the 
target  distances  investigated,  we  need  to  know  which 
distance  cues  are  effective  at  those  distances,  and  to 
assess  the  extent  to  which  these  cues  were  present  or 
absent  in  our  research.  Cutting  and  Vishton  (1995)  have 
identified  which  depth  cues  are  most  effective  at 
different  distances  and  related  these  cues  to  three 
egocentric  regions  or  zones  of  space:  (1)  personal  space 
extends  just  beyond  arms  reach  and  refers  to  space  used 
by  a static  observer;  (2)  action  space  extends  to  about 
100  feet  and  refers  and  includes  distances  in  which  an 
observer  can  throw  an  object  to  another  person  or  easily 
talk  to  others;  and  (3)  vista  space  extends  beyond  100 
feet  Kline  & Witmer  (1996)  studied  both  personal  and 
action  space.  In  personal  space  the  most  important  depth 
cues  are  occlusion,  binocular  disparity,  relative  size, 
convergence  and  accommodation.  The  remaining  studies 
investigated  action  space  and  vista  space.  The  primary 
distance  cues  in  action  space  and  vista  space  are  the 
pictorial  cues,  including  occlusion,  height  in  the  visual 
field,  convergent  linear  perspective,  relative  size,  and 
relative  textural  density.  In  addition,  two  other  distance 
cues,  binocular  disparity  and  motion  perspective  are 
effective  distance  cues  in  action  space.  Note  that 
accommodation  and  convergence  are  not  effective  depth 
cues  in  action  space  or  vista  space. 

Witmer  & Kline  (1997)  have  shown  that  while  relative 
textural  density  influences  distance  estimates  in  VE,  its 
effects  are  typically  too  small  to  account  for  the 
differences  between  real  world  and  VE  distance 
estimation  performance.  Similarly  adding  observer 
movement,  which  provides  motion  perspective  and  other 
movement  related  cues  does  not  eliminate  the  deficits  in 


performance  in  VEs  (Witmer  & Kline,  1998).  Research 
by  Wright  (1995)  and  Witmer  & Kline  (1996)  suggests 
that  simply  using  a high  resolution  or  wide  FOV  VE 
display  cannot  erase  the  deficits  in  perceived  distance. 
Although  occlusion  is  probably  the  most  powerful  depth 
cue  in  action  space,  it  was  not  a factor  in  our  distance 
estimation  tasks.  Of  the  remaining  distance  cues  listed 
by  Cutting  & Vishton  (1995),  height  in  the  visual  field, 
convergent  linear  perspective,  relative  size,  and 
binocular  disparity  appear  to  be  the  most  likely 
candidates  for  explaining  the  observed  discrepancies 
between  VE  and  real  world  judgements  of  distance. 

The  National  Research  Council  (1997)  has  suggested 
that  the  restricted  FOV  provided  by  VE  displays  must 
degrade  height  in  the  visual  field  and  convergent  linear 
perspective  as  cues  for  distance  at  some  point.  The 
limited  vertical  FOV  found  in  most  VE  displays  (ranging 
from  40  to  90  degrees)  may  be  responsible  for  this 
degradation.  By  comparison,  the  real  world  vertical  FOV 
is  approximately  120  degrees.  A reduced  vertical  FOV 
may  result  in  distant  objects  appearing  closer  in  VE  than 
they  would  in  the  real  world  because  these  objects  would 
be  compressed  into  a smaller  visual  frame  as  they  recede 
into  the  distance.  Kline  & Witmer  (1996)  showed  that  a 
reduced  horizontal  FOV  could  also  adversely  impact  the 
accuracy  of  distance  estimates  by  reducing  or 
eliminating  linear  perspective  cues.  Because  linear 
perspective  cues  are  among  the  most  effective  distance 
cues  in  simulated  environments  (Surdick  et  al.,  1997), 
reducing  or  eliminating  these  cues  can  have  a major 
impact  on  the  accuracy  of  distance  estimates. 

In  VEs,  emulation  of  binocular  disparity  is  achieved  by 
presenting  different  images  to  the  two  eyes  with  some 
central  area  overlap.  While  this  technique  may  provide 
the  illusion  of  depth  in  VE,  it  may  not  faithfully 
reproduce  real  world  depth.  Cutting  <&  Vishton  (1995) 
noted  that  early  stereoscopic  pictures  enhanced  the 
distance  between  the  eyes  to  show  large  expanses  and 
cityscapes,  diminishing  the  effective  size  of  the  objects 
seen.  Relative  size  may  be  important  factor  at  the  closer 
distances  because  the  perceived  size  of  an  object 
accelerates  as  the  distance  to  the  object  decreases, 
yielding  a looming  effect.  Accommodation  and 
convergence  cues  are  not  accurate  in  VEs,  a fact  that 
researchers  often  use  to  explain  poor  distance  estimation 
in  VEs.  However,  these  cues  are  only  important  for 
judgments  in  personal  space  and  at  the  shorter  distances 
within  action  space. 

Additional  research  is  needed  to  determine  which  of  the 
distance  cues  operating  in  action  space  are  most 
responsible  for  degrading  distance  judgements  in  VE. 
Once  the  causes  of  this  degradation  are  isolated,  we  can 
begin  working  toward  a solution.  The  solution  may  be  as 
simple  as  increasing  the  VE  display  vertical  or  horizontal 
FOV,  or  adjusting  the  overlap  in  VE  stereoscopic 
viewing  devices.  On  the  other  hand,  it  may  involve 
major  technological  advances,  such  as  inventing  new 
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techniques  for  emulating  binocular  disparity  in  VE 
displays. 

Having  identified  some  of  the  factors  that  affect  distance 
judgements  in  VE,  we  turned  our  attention  back  to  how 
to  best  use  VEs  for  training  configuration  knowledge. 
Our  approach  was  to  utilize  unique  capabilities  of  VE 
that  might  compensate  for  its  inherent  deficiencies  (e.g., 
VE’s  tendency  to  distort  distance  judgements). 

Enhanced  VEs  for  spatial  knowledge  acquisition 
A computer  model  of  one  floor  of  a large  office  building, 
used  in  previous  research  (Bailey  & Witmer,  1994; 
Witmer  et  ah,  1996)  was  adapted  for  this  experiment.  All 
passageways  in  the  virtual  building  were  widened  to 
reduce  collisions,  an  improved  collision  detection 
algorithm  was  introduced  that  decreased  the  need  to  back 
away  from  objects  following  a collision,  and  additional 
rooms  were  modeled.  Separate  VE  models  were 
constructed  to  represent  the  standard  and  enhanced 
environments.  The  enhanced  environment  was  created 
by  adding  theme  objects  and  sounds  to  the  standard 
environment  model.  The  models  were  created  using 
Multigen  II  software  and  rendered  by  a Silicon  Graphics 
Onyx  with  eight  200MHz  processors  and  three 
RealityEngine2  Graphics  Pipes.  Both  models  were 
displayed  using  a Virtual  Research  V8  Helmet-Mounted 
Display  (HMD).  Locomotion  through  the  VE  was 
achieved  by  virtual  walking  in  the  safety  pod  shown  in 
Figure  1.  Head  and  body  movements  were  independently 
tracked. 


Figure  1:  Safety  Pod  for  Virtual  Walking 


The  participants  were  sixty-four  college  students  who 
had  no  previous  exposure  to  the  building.  Following  a 
brief  train-up,  the  participants  were  randomly  assigned  to 
one  of  eight  treatment  groups,  who  received  different 
levels  of  navigation  aids.  Depending  on  group 
assignment,  a participant  experienced  either  the  standard 


or  enhanced  VE,  received  orientation  cues  or  did  not, 
and  could  chose  to  view  the  VE  from  an  aerial 
perspective  or  was  restricted  to  viewing  the  VE  from  the 
normal  perspective.  Orientation  cues  included  an  arrow 
projecting  from  the  chest  of  the  participant’s  avatar  and  a 
flagpole  visible  throughout  the  environment. 

Groups  having  an  aerial  perspective  could  view  the  VE 
from  heights  of  49,  98,  and  394  feet  for  a period  of  up  to 
one  minute.  After  one  minute,  they  automatically 
returned  to  the  normal  perspective  view.  The  viewing 
heights  were  selected  such  that  participants  could  see 
either  the  whole  third  floor  layout  at  once  at  394  feet  or 
parts  of  the  layout  at  39  and  98  feet.  More  objects  in  the 
environment  could  be  recognized  at  the  lower  viewing 
heights.  Figure  2 shows  the  VE  from  a viewing  height  of 
98  feet.  While  in  the  aerial  mode  participants  could 
further  explore  the  environment  by  flying  to  other  aerial 
locations  (accomplished  by  walking  in  place).  To  return 
to  ground  level  they  pressed  the  thumb  button  on  their 
hand  controller,  and  gradually  descended  to  reenter  their 
virtual  body  at  the  exact  location  where  they  left  it  when 
they  started  to  fly. 


Figure  2:  Aerial  View  of  Third  Floor  Viewed  at  98  feet 

The  enhanced  environment  model  was  divided  into  four 
themed  quadrants  or  districts.  Groups  exposed  to  the 
themed  environment  encountered  sights  and  sounds 
associated  with  the  themed  quadrants.  Each  destination 
had  a memorable  theme  object  located  inside  the  room 
and  an  associated  sound  that  became  louder  as  the 
participant  approached  the  destination  room.  Additional 
theme  objects  were  positioned  along  the  building 
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corridors,  but  no  sounds  were  associated  with  these 
additional  objects.  The  themes  embedded  in  the 
quadrants  were  a tropical  island  theme,  a wild  animals 
theme,  an  extraterrestrial  (or  outer  space)  theme,  and  a 
sports  theme.  Upon  encountering  a theme  object  located 
inside  one  of  the  destination  rooms,  participants  were 
asked  to  identify  the  theme  represented  by  that  object. 
This  encouraged  participants  to  associate  destination 
rooms  with  their  location  in  a particular  quadrant. 

The  orientation  cue  groups  were  asked  to  relate  their 
current  position  to  their  starting  position  marked  by  a 
virtual  flagpole.  This  was  accomplished  by  facing  the 
flagpole  upon  reaching  each  destination.  The  flagpole 
served  as  a global  orientation  cue  that  allowed 
participants  to  continually  update  their  current  position 
based  on  their  known  starting  position.  Participants  were 
told  to  use  the  arrow  projecting  from  the  chest  of  their 
avatar  as  an  indication  of  their  current  heading  and  as  a 
way  of  aligning  their  virtual  body  so  as  to  avoid 
collisions  with  walls  and  doorways. 

Individual  training  and  testing  phases  comprised  the 
research.  During  the  first  training  phase  participants 
followed  a virtual  tour  guide  through  the  VE,  pausing  at 
each  destination  room,  and  identifying  it  by  name.  The 
tour  guide  verbally  described  the  ‘non-theme  related’ 
distinguishing  features  of  each  destination.  In  the  second 
training  phase,  participants  explored  the  VE  freely,  while 
trying  to  locate  and  identify  each  previously  visited 
destination.  In  the  final  training  phase,  participants 
attempted  to  take  the  shortest  route  from  the  third  floor 
lobby  to  each  named  destination.  If  the  participants  did 
not  find  the  destination  within  three  minutes,  they  were 
verbally  guided  to  it.  Knowledge  of  the  building 
configuration  was  tested  by  asking  participants  to 
complete  the  following  tasks:  (1)  take  the  shortest  route 
between  designated  rooms,  (2)  estimate  the  distance  and 
direction  to  locations  not  in  the  line-of- sight,  and  (3) 
place  room  cutouts  in  their  correct  locations  on  a map. 
Similar  to  the  NVGL  procedure,  participants  estimated 
distance  by  walking  the  straight-line  distance  between 
their  current  location  and  the  perceived  location  of  the 
destination  without  vision.  Navigation  aids  were  not 
provided  during  the  testing  phase.  A follow-up  room 
placement  test  was  given  one  week  after  the  initial  test  to 
examine  retention  of  configuration  knowledge. 

The  purpose  of  the  navigation  aids  was  to  offset  the 
effects  of  VE  deficiencies  that  interfere  with  the 
acquisition  of  configuration  knowledge  in  a VE.  The 
orientation  cues  had  no  significant  effects  on 
configuration  knowledge  acquisition,  F(4,51)=2.05, 
£>=.10.  Participants  receiving  the  enhanced  environment 
performed  better  during  training  than  those  who  received 
the  standard  environment,  F(4,51)=2.80,  £><.05,  but  not 
on  the  tests  of  configuration  knowledge.  Only  the 
participants  who  received  an  aerial  perspective  view 
performed  significantly  better  both  during  training, 
F(4,5I)=5.69,  £><.001,  and  on  the  configuration 


knowledge  tests,  F(6,50)=3.44,  p<.01.  Participants  with 
an  aerial  view  during  training  also  performed  better  on 
the  1-week  retention  test,  F(l,51)=9.76,  p<.0 1 . 

The  effectiveness  of  the  navigation  aids,  including  the 
aerial  view,  seemed  to  depend  on  how  the  participants 
used  the  aids.  When  the  aids  were  used  as  a crutch  to 
quickly  find  a room,  they  were  not  effective.  Similarly  in 
those  cases  where  the  navigation  aids  increased  the 
workload  beyond  what  the  participants  could  handle,  no 
performance  gains  were  realized.  The  navigation  aids 
seemed  to  work  best  when  participants  were  able  to  use 
them  to  mentally  structure  the  environment.  For 
additional  discussion  of  the  effects  of  these  navigation 
aids,  see  Witmer,  Sadowski,  and  Finkelstein  (in  press). 

Conclusions 

What  then  must  be  done  to  ensure  that  training  in  virtual 
environments  meets  military  human  performance  goals? 
The  first  step  is  identify  the  shortcomings  of  VE  that 
adversely  affect  VE  training  effectiveness  and  link  these 
shortcomings  to  specific  performance  deficiencies.  For 
example,  in  spatial  learning,  a reduced  FOV  in  VE  was 
linked  to  poor  distance  estimation  and  spatial 
disorientation,  ultimately  impairing  the  acquisition  of 
route  and  configuration  knowledge.  The  next  step  is  to 
determine  if  the  deficiency  can  be  addressed  directly,  or 
if  not,  how  to  compensate  for  the  deficiency.  Currently 
increasing  the  FOV  for  VE  displays  is  an  expensive 
proposition  and  large  FOV  devices  may  sacrifice 
resolution  for  the  larger  FOV.  We  used  auditory  cues  to 
compensate  for  poor  distance  estimation  in  the  VE  and 
showed  that  the  estimates  were  improved  even  when  the 
cues  were  not  present.  We  adopted  the  NVGL  procedure 
to  reduce  the  affects  of  individual  differences  on  distance 
estimation  tasks,  and  used  it  to  measure  distance  in  the 
projective  convergence  test.  We  took  steps  to  reduce 
collisions  in  VE,  thereby  reducing  the  amount  of 
disorientation  that  occurred  with  a narrow  FOV  display. 
We  also  increased  the  effective  FOV  by  providing 
participants  with  an  aerial  view  leading  to  improved 
acquisition  of  configuration  knowledge.  In  searching  for 
effective  compensatory  mechanisms,  some  promising 
factors  had  little  practical  effects.  A more  realistic 
walking  interface  (i.e.,  a treadmill)  did  not  improve 
distance  estimates  and  dividing  the  environment  into 
themed  quadrants  or  districts  did  not  improve  the 
performance  on  tests  of  configuration  knowledge.  This 
demonstrates  the  importance  of  evaluating  VE  interfaces 
and  training  enhancements  in  controlled  experiments 
before  implementing  them  in  military  training 
environments. 
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